Variance and Standard Deviation - Aidan Helfant's Digital Garden

Variability measures who well an individual score or group of scores represent the entire distribution. There are three ways of measuring variability, the range, the Standard Deviation (SD), and the Variance. ## The Range The range is most commonly defined as the difference between the upper limit and lower limit. When uses for [[Variables, Scales, Real Limits, Percentile ranks#^a50fc7|continuous scales]], however, it's defined as the upper real limit minus the lower real limit. When uses for [[Variables, Scales, Real Limits, Percentile ranks#^a50fc7|discontinuous scales]], the range is simply the number of categories. **The big issue with the range is it doesn't take into account the scores in the middle. Like the [[Measures of Central Tendency in Distributions|the mean, median, and mode]], it has issues for measurement in certain cases.** ### The Interquartile Range Another variation of the range is the interquartile range. This measure of variability uses [[Measures of Central Tendency in Distributions#^a93746|the median]] as its measure of central tendency. It divides the data set up into four equally sized groups. This is similar to [[Variables, Scales, Real Limits, Percentile ranks#^13693d|percentiles]] which are simply a type of quantile that divides the range up into 100 equally sized segments. ^534739 Q1 represents the 25th percentile, Q2 the 50th, and Q3 the 75th percentile. The minimum and maximum are obviously the minimum and maximum. The ==Interquartile Range (IQR)== represents the middle 50% of data and it's the distance from Q1 to Q3. ###### When should you use the IQR **The IQR is best used when the data is skewed or has outliers because the data comes from the middle 50% of the distribution.** If the data doesn't split evenly you can find the quartiles through [[Frequency Graphs#^c090f4|interpolation]]. **You can only find the IQR for ratio, interval, and sometimes ordinal data if the numbers make sense.** ### Box Plots You can represent these quartiles and maximum and minimums using a box plot. ![[Pasted image 20220907160002.png]] # Calculating for a Population ## Standard Deviation (Population) ^36d688 **Deviations, Squared Deviations and SS** The ==deviation== is a measure of the distance a score is away from the mean. It's calculated by subtracting the mean from some score, X. The problem is adding up all of the deviations will equal zero because the negatives will cancel with the positives. There are two ways we can account for this fact: 1. **Find the mean absolute deviations** 2. **Find the sum squared deviations** The absolute deviations is found with this equation. ![[Pasted image 20220907145933.png]] The ==SS== is the "sum of squares." Anytime it's phrased in this way in a statistical context it always means the sum of squares. ![[Pasted image 20220907150045.png]] Standard deviation (SD) is the square root of the variance and provides a measure of the standard, or average distance from the mean. ![[Pasted image 20220907150127.png]] It's important to understand that this is the SD equation for the ==population==. To represent the population standard deviation we use σ'. For a sample, the [[#^928190|calculation is done differently.]] ## The Variance (Population) ^891aad ==Variance== is the mean of squared deviations from the mean. This means it equals SS divided by the sample size. We use σ2 to represent the population variance. ![[Pasted image 20220902081558.png]] ## Two Equations Calculating SS (Population) ### Definitional Formula The definitional formula is calculated by taking the sum of all of the squared deviations. ![[Pasted image 20220902082048.png]] **The definitional formula is better to use when you are dealing with easy whole numbers rather than fractions and decimals** ### Computational Formula The computational formula works by taking the sum of the squared values of X and subtracting them by the sum of values, X, squared and divided by N. ![[Pasted image 20220902082217.png]] # Calculating for a Sample ^928190 The equations you have seen so far are not relevant when we are calculating descriptive statistics for a sample. This is because by our sample might not be perfectly representative of the population and there could be some variability. ### Calculating SS (Sample) For a sample the definitional equation is exactly the same for calculating the SS except for the notation. Instead of using μ' for the population mean we use M for the sample mean and instead of using N for the population size we use n for the sample size. ![[Pasted image 20220902083611.png]] The same is true of the computational formula ![[Pasted image 20220902083701.png]] ## Calculating Variance and SD (Sample) ### Degrees of Freedom For both a population and for a sample the Degrees of Freedom refer to the N or n. For a sample we subtract 1 from the degrees of freedom for calculating Variance and SD so that the breadth of the population is appreciated. ### Variance and SD Equations However, for calculating variance and SD the equation changes slightly to account for the fact that the sample will likely not indicate the degree of variability that is present in the population as it's so much smaller. This is why we subtract one from the sample size in each equation. ![[Pasted image 20220902083938.png]] ![[Pasted image 20220902083949.png]] ### How Variability is effected by change Adding constants to every score in a dataset doesn't change the SD but multiplying every score [[Measures of Central Tendency in Distributions#^deb895|the mean]] means you have to multiply the standard deviation by that value. ## Biased and Unbiased Statistics ==Unbiased statistics== are those that, on average, produce the same value as the corresponding population parameter. ^bff1ee ==Biased statistics== are those that, on average, consistently overestimate or underestimate the corresponding population parameter. Standard deviation for a sample is a biased statistic because it increases as the size of the sample size increases. However, [[Probability#^a3a2c3|the standard deviations of the distribution of sample means decreases as the sample size goes up]]. This standard deviation of sample means is called the standard error of the mean. # Z Scores [[Z scores]] ^eb8a36